Principal Curves : Learning
نویسنده
چکیده
and submitted in partial fulfillment of the requirements for the degree of Doctor of Philosophy (Computer Science) complies with the regulations of this University and meets the accepted standards with respect to originality and quality. The subjects of this thesis are unsupervised learning in general, and principal curves in particular. Principal curves were originally defined by Hastie [Has84] and Hastie and Stuetzle [HS89] (hereafter HS) to formally capture the notion of a smooth curve passing through the " middle " of a d-dimensional probability distribution or data cloud. Based on the definition, HS also developed an algorithm for constructing principal curves of distributions and data sets. The field has been very active since Hastie and Stuetzle's groundbreaking work. Numerous alternative definitions and methods for estimating principal curves have been proposed, and principal curves were further analyzed and compared with other unsupervised learning techniques. Several applications in various areas including image analysis, feature extraction, and speech processing demonstrated that principal curves are not only of theoretical interest, but they also have a legitimate place in the family of practical unsupervised learning techniques. Although the concept of principal curves as considered by HS has several appealing characteristics , complete theoretical analysis of the model seems to be rather hard. This motivated us to redefine principal curves in a manner that allowed us to carry out extensive theoretical analysis while preserving the informal notion of principal curves. Our first contribution to the area is, hence, a new theoretical model that is analyzed by using tools of statistical learning theory. Our main result here is the first known consistency proof of a principal curve estimation scheme. The theoretical model proved to be too restrictive to be practical. However, it inspired the design of a new practical algorithm to estimate principal curves based on data. The polygonal line algorithm, which compares favorably with previous methods both in terms of performance and computational complexity, is our second contribution to the area of principal curves. To complete the picture, in the last part of the thesis we consider an application of the polygonal line algorithm to handwritten character skeletonization. iv Acknowledgments I would like to express my deep gratitude to my advisor, Adam Krzy˙ zak, for his help, trust and invaluable professional support. He suggested the problem, and guided me through the stages of this research. My great appreciation goes to Tamás Linder for leading me through …
منابع مشابه
Principal curves with bounded turn
Principal curves, like principal components, are a tool used in multivariate analysis for ends like feature extraction. Defined in their original form, principal curves need not exist for general distributions. The existence of principal curves with bounded length for any distribution that satisfies some minimal regularity conditions has been shown. We define principal curves with bounded turn,...
متن کاملLocally Defined Principal Curves and Surfaces
Principal curves are defined as self-consistent smooth curves passing through the middle of the data, and they have been used in many applications of machine learning as a generalization, dimensionality reduction and a feature extraction tool. We redefine principal curves and surfaces in terms of the gradient and the Hessian of the probability density estimate. This provides a geometric underst...
متن کاملRegularization-free principal curve estimation
Principal curves and manifolds provide a framework to formulate manifold learning within a statistical context. Principal curves define the notion of a curve passing through the middle of a distribution. While the intuition is clear, the formal definition leads to some technical and practical difficulties. In particular, principal curves are saddle points of the mean-squared projection distance...
متن کاملA Polygonal Line Algorithm for Constructing Principal Curves
Principal curves have been defined as “self consistent” smooth curves which pass through the “middle” of a d-dimensional probability distribution or data cloud. Recently, we [1] have offered a new approach by defining principal curves as continuous curves of a given length which minimize the expected squared distance between the curve and points of the space randomly chosen according to a given...
متن کاملLearning and Design of Principal Curves
Principal curves have been defined as “self consistent” smooth curves which pass through the “middle” of a d-dimensional probability distribution or data cloud. They give a summary of the data and also serve as an efficient feature extraction tool. We take a new approach by defining principal curves as continuous curves of a given length which minimize the expected squared distance between the ...
متن کامل